Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

106 PART 3 Getting Down and Dirty with Data

Dealing with more than two levels in a category

When a categorical variable has more than two levels (like the Type of Caregiver or

Likert agreement scale examples we describe in the earlier section “Looking at Lev-

els of Measurement”), data storage gets even more interesting. First, you have to

ask yourself, “Is this variable a Choose only one or Choose all that apply variable?”

The coding is completely different for these two kinds of multiple-choice

variables.

You handle the Choose only one situation just as we describe for Type of Caregiver in

the preceding section — you establish numeric code for each alternative. For the

Likert scale example, if the item asked about patient satisfaction, you could have a

categorical variable called PatSat, with five possible values: 1 for strongly disagree,

2 for somewhat disagree, 3 for neither agree nor disagree, 4 for somewhat agree,

and 5 for strongly agree. And for the Type of Caregiver example, if only one kind of

caregiver is allowed to be chosen from the three choices of nurse, physician, or

social worker, you can have a categorical variable called CaregiverType with three

possible values: 1 for nurse, 2 for physician, and 3 for social worker. Depending

upon the study, you may also choose to add a 4 for other, and a 9 for unknown

(9, 99, and 999 are codes conventionally reserved for unknown). If you find

unexpected values, it is important to research and document what these mean to

help future analysts encountering the same data.

But the situation is quite different if the variable is Choose all that apply. For the

Type of Caregiver example, if the patient is being served by a team of caregivers,

you have to set up your database differently. Define separate variables in the data-

base (separate columns in Excel) — one for each possible category value. Imagine

that you have three variables called Nurse, Physician, and SW (the SW stands for

social worker). Each variable is a two-value category, also known as a two-state

flag, and is populated as 1 for having the attribute and 0 for not having the attrib-

ute. So, if participant 101’s care team includes only a physician, participant 102’s

care team includes a nurse and a physician, and participant 103’s care team

includes a social worker and a physician, the information can be coded as shown

in the following table.

Subject

Nurse

Physician

101

102

103

If you have variables with more than two categories, missing values theoretically

can be indicated by leaving the cell blank, but blanks are difficult to analyze in

statistical software. Instead, categories should be set up for missing values so they

can be part of the coding system (such as using a numerical code to indicate